Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add CASM serialization of Cairo programs #5912

Open
wants to merge 29 commits into
base: main
Choose a base branch
from

Conversation

zmalatrax
Copy link

@zmalatrax zmalatrax commented Jun 28, 2024

This PR adds a new command sierra-compile-json which outputs a JSON with everything needed to run a Cairo v2.6.4 program in any environment (i.e. outside of Rust).

Rationale: There is currently no straightforward way to execute a Cairo program outside of a Rust project. It is much needed to provide other implementations of the Cairo VM a way to run Cairo programs the same way StarkNet contracts can.

I believe that such feature belongs to the Cairo compiler rather than an external project (e.g. Universal-Sierra-Compiler).

It follows the same structure as *.compiledContractClass.json output from the command starknet-sierra-compile.
The fields of the *.casm.json json output are:

  • prime
  • compiler_version
  • bytecode
  • hints
  • pythonic_hints
  • entrypoint
  • builtins

The last two fields are the exact same as offset and builtins from the elements of entry_points_by_type (besides the selector)

When using the command sierra-compile-json, a flag gas-usage-check enables the GasBuiltin, as this builtin is not mandatory for Cairo programs(?)

I haven't added the bytecode_segment_lengths as it is only used in StarkNet contracts with multiple entrypoints (to be confirmed, might be a wrong assumption).

Example *.casm.json outputs can be found here


This change is Reviewable

@zmalatrax zmalatrax marked this pull request as draft June 28, 2024 17:57
@zmalatrax zmalatrax marked this pull request as ready for review July 1, 2024 08:25
@ilyalesokhin-starkware
Copy link
Contributor

crates/bin/sierra-compile-json/src/main.rs line 22 at r1 (raw file):

    /// Add pythonic hints.
    #[arg(long, default_value_t = false)]
    add_pythonic_hints: bool,

is this needed?

Code quote:

    #[arg(long, default_value_t = false)]
    add_pythonic_hints: bool,

Copy link
Contributor

@ilyalesokhin-starkware ilyalesokhin-starkware left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 22 files at r1, all commit messages.
Reviewable status: 3 of 22 files reviewed, 2 unresolved discussions (waiting on @zmalatrax)


crates/cairo-lang-casm/src/ap_change.rs line 12 at r1 (raw file):

mod test;

#[derive(Copy, Clone, Debug, Eq, Hash, PartialEq, Serialize, Deserialize)]

why is this needed?

Code quote:

, Serialize, Deserialize

@zmalatrax zmalatrax force-pushed the feat/main-casm-json branch from a118f52 to c9c219a Compare July 4, 2024 13:42
@zmalatrax
Copy link
Author

crates/bin/sierra-compile-json/src/main.rs line 22 at r1 (raw file):

    /// Add pythonic hints.
    #[arg(long, default_value_t = false)]
    add_pythonic_hints: bool,

is this needed?

No it can be removed (was thinking of making the pythonic hints optional at first)

@zmalatrax
Copy link
Author

crates/cairo-lang-casm/src/ap_change.rs line 12 at r1 (raw file):

mod test;

#[derive(Copy, Clone, Debug, Eq, Hash, PartialEq, Serialize, Deserialize)]

why is this needed?

Code quote:

, Serialize, Deserialize

Before returning the builtins and entrypoint fields I was serializing the ::main Sierra Function where I needed to add all these serde traits.

So it is not needed anymore, cleaned up

@zmalatrax
Copy link
Author

I've added two fields, input_args_type and return_type:

  • input_args_type: Array containing the (explicit) arguments type of the main function, if any.
  • return_type: Array containing the return type of the main, if any
    Otherwise, these fields are empty arrays.

Copy link
Member

@mkaput mkaput left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 3 of 22 files reviewed, 3 unresolved discussions (waiting on @ilyalesokhin-starkware and @zmalatrax)

a discussion (no related file):
TBH The fact that you are adding custom stuff or doing other adjustments because something changed in your project is a strong argument for me that this repository is not a good place for this code.

I do understand your rationale but I am very afraid that the output format is very usage-specific.

Also, the very existence of Cairo Native (an external project) makes me think whether CASM has to be considered an important medium for execution.

But I am not involved in this topic so I don’t enforce this.


@ClementWalter
Copy link

Given the current schema

Cairo > starknet

prime
compiler_version
bytecode
hints
pythonic_hints
entry_points_by_type

Cairo0

attributes
builtins
compiler_version
data
debug_info
hints
identifiers
main_scope
prime
reference_manager

The proposition currently is

prime
compiler_version
bytecode
hints
pythonic_hints
entrypoint
builtins
input_args_type
return_type

I'm wondering if we could keep the list of entry points and not only keep the main one, so that one compiled file can have several entrypoints

Copy link
Collaborator

@piotmag769 piotmag769 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 22 files at r1.
Reviewable status: 3 of 22 files reviewed, 3 unresolved discussions (waiting on @ilyalesokhin-starkware and @mkaput)

a discussion (no related file):

TBH The fact that you are adding custom stuff or doing other adjustments because something changed in your project is a strong argument for me that this repository is not a good place for this code.

Strongly agree.

A few things:

  1. It seems to me that you are trying to create a non-contract CASM format that would be exposed in public compiler API and then used by other project (mainly non-rust VMs), is that right? If so then I'm all for it, but:
  • did you consult it with developers of other projects? I think we don't want to end up with a format that is specific for your project and resides in the compiler repo
  • are you sure the information in the format are sufficient and the format won't need breaking changes?
  1. This PR seems to contain some weirdly specific logic e.g. extracting "::main" function entrypoint (why this one? What about other functions?). Can you explain a rationale behind this?

@zmalatrax
Copy link
Author

  1. It seems to me that you are trying to create a non-contract CASM format that would be exposed in public compiler API and then used by other project (mainly non-rust VMs), is that right? If so then I'm all for it, but:

Yes, the goal is to have a standardized format that would be used by the different VM. It can also benefits the Rust VM, as it currently takes a .cairo or .sierra file and perform these compilation steps before running the program.

  • did you consult it with developers of other projects? I think we don't want to end up with a format that is specific for your project and resides in the compiler repo

Totally agree that the format should be project-agnostic.
I've shared with the other Cairo VM projects this PR, and I've been talking about such artifacts with @TAdev0 and @rodrigo-pino from NetherMind (cairo-vm-go) but not about the actual format yet.

  • are you sure the information in the format are sufficient and the format won't need breaking changes?

In its current state I'm not 100% convinced that it is sufficient.
For example the entrypoints and the data to return about explicit input arguments and the return type (only their type or also their size?, should the pythonic hints should be optional? (defaulting to false), etc).

I've opened this PR to start the discussion on a 'non-contract CASM' format standard, providing a basis to iterate over.

  1. This PR seems to contain some weirdly specific logic e.g. extracting "::main" function entrypoint (why this one? What about other functions?). Can you explain a rationale behind this?

At first I thought that the sole entrypoint of a Cairo program would be its main function, but this is inaccurate.
So every functions should be included, grouped in a similar way as the entry_points_by_types in starknet-sierra-compile

We could have a similar field, such as entry_points:

{
  "hints": [],
  "entry_points": {
    "main": {
      "builtins": [],
      "offset": 0,
      "input_args_type": [],
      "return_type": []
    },
    "foo": {},
    "bar": {}
  }
}

Copy link
Collaborator

@piotmag769 piotmag769 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should reach a consensus about the format first, then we can implement it. I think USC would be also interested in the format cc @Arcticae

Reviewable status: 3 of 22 files reviewed, 3 unresolved discussions (waiting on @ilyalesokhin-starkware and @mkaput)

Copy link
Member

@mkaput mkaput left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've opened this PR to start the discussion on a 'non-contract CASM' format standard, providing a basis to iterate over.

Unfortunately, this is not the best way to initiate a discussion because PRs tend to only be noticed by people who are explicitly mentioned there or some nerds like us who are watching everything. 😃


I'm thinking about your case and I guess the most effective approach for you could be to:

  1. Write a prototype for this tool separately on your premises
  2. Validate it works for your VM -- so you'll have a working solution this early
  3. Reach out to other parties doing similar stuff and ask them if your tool fills their needs. Ensure there's consensus.
  4. Then standardize it

You can technically make a prototype in this PR and if @ArielElp and @orizi agree, you can try shipping this for 2.8.0 (2.7.0 is feature freezed) as alpha and then stabilize it in >=2.9.0. This way you somehow skip the 4th step.

Reviewable status: 3 of 22 files reviewed, 3 unresolved discussions (waiting on @ilyalesokhin-starkware and @zmalatrax)

@ilyalesokhin-starkware
Copy link
Contributor

crates/cairo-lang-sierra/src/program.rs line 154 at r3 (raw file):

            .iter()
            .find(|f| {
                if let Some(name) = &f.id.debug_name {

Can we avoid using the debug_name here?

Code quote:

debug_name

@ilyalesokhin-starkware
Copy link
Contributor

crates/cairo-lang-sierra-to-casm/src/compiler.rs line 181 at r3 (raw file):

        for type_id in main_func.signature.param_types.iter() {
            let debug_name = match type_id.debug_name.clone() {

This is supposed to be used for debugging.
it is better not to relay on that.

Code quote:

debug_name

@zmalatrax
Copy link
Author

Can we avoid using the debug_name here?
This is supposed to be used for debugging.
it is better not to relay on that.

Definitely, I'll propose something similar to what is done in casm_contract_class.rs to extract builtins

I was also thinking about having a similar selector field which would encapsulate the signature params but I still need to put more thoughts on it

@mkaput
Copy link
Member

mkaput commented Jul 9, 2024

@zmalatrax please use Reviewable for responding to comments :)

Copy link
Member

@mkaput mkaput left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not an authoritative review, just marking this for myself as done

Reviewed 11 of 22 files at r1, 5 of 7 files at r2, 2 of 3 files at r4, 2 of 3 files at r7, 1 of 3 files at r8, 2 of 3 files at r9, 1 of 1 files at r10, all commit messages.
Reviewable status: all files reviewed, 5 unresolved discussions (waiting on @ilyalesokhin-starkware and @zmalatrax)

@zmalatrax zmalatrax force-pushed the feat/main-casm-json branch from 1bb57ff to efa25e4 Compare August 7, 2024 10:10
Copy link
Member

@mkaput mkaput left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 11 of 11 files at r11, all commit messages.
Reviewable status: all files reviewed, 5 unresolved discussions (waiting on @ilyalesokhin-starkware and @zmalatrax)

@ilyalesokhin-starkware
Copy link
Contributor

crates/cairo-lang-starknet-classes/src/casm_contract_class.rs line 204 at r1 (raw file):

/// Context for resolving types.
pub struct TypeResolver<'a> {

Moving this to a new file can be a separate PR, right?

Code quote:

pub struct TypeResolver<'a> {

@zmalatrax zmalatrax force-pushed the feat/main-casm-json branch from efa25e4 to 1ead3e3 Compare October 24, 2024 09:57
Copy link
Author

@zmalatrax zmalatrax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 8 of 25 files reviewed, 6 unresolved discussions (waiting on @ilyalesokhin-starkware and @mkaput)


crates/cairo-lang-starknet-classes/src/casm_contract_class.rs line 204 at r1 (raw file):

Previously, ilyalesokhin-starkware wrote…

Moving this to a new file can be a separate PR, right?

Yes, but that PR would be needed for this one then
(CasmCairoProgram::new() uses TypeResolver, making cairo-lang-sierra-to-casmdependent of the cairo-lang-starknet-classes otherwise)

@zmalatrax zmalatrax force-pushed the feat/main-casm-json branch from ff19826 to cf7b980 Compare October 25, 2024 08:48
Copy link
Collaborator

@piotmag769 piotmag769 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 23 files at r1, 3 of 7 files at r2, 1 of 3 files at r7, 7 of 17 files at r12, 10 of 13 files at r13, all commit messages.
Reviewable status: 22 of 25 files reviewed, 7 unresolved discussions (waiting on @ilyalesokhin-starkware, @mkaput, and @zmalatrax)


crates/cairo-lang-sierra-to-casm/src/compiler.rs line 154 at r13 (raw file):

#[derive(Clone, Debug, PartialEq, Eq, Serialize, Deserialize)]
pub struct CasmCairoArg {
    pub generic_id: GenericTypeId,

IIRC you wanted the new format to be compiler-agnostic, so I don't see a benefit of passing GenericTypeId here. How would you want to use it? Isn't debug name enough?

Maybe the assumption changed or I don't remember it correctly, so feel to correct me.

Copy link
Author

@zmalatrax zmalatrax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 22 of 25 files reviewed, 7 unresolved discussions (waiting on @ilyalesokhin-starkware, @mkaput, and @piotmag769)


crates/cairo-lang-sierra-to-casm/src/compiler.rs line 154 at r13 (raw file):

Previously, piotmag769 (Piotr Magiera) wrote…

IIRC you wanted the new format to be compiler-agnostic, so I don't see a benefit of passing GenericTypeId here. How would you want to use it? Isn't debug name enough?

Maybe the assumption changed or I don't remember it correctly, so feel to correct me.

Indeed, debug_name provides the needed information, generic_id is redundant.
At first I wanted to avoid relying on debug_name (used by the cairo-vm though), thus trying to rely on generic_id but it actually does not provide as much information as debug_name.

@zmalatrax zmalatrax force-pushed the feat/main-casm-json branch from a5ace5b to 87b3063 Compare October 25, 2024 15:39
Copy link
Member

@mkaput mkaput left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 13 files at r13, 2 of 2 files at r14, all commit messages.
Reviewable status: all files reviewed, 7 unresolved discussions (waiting on @ilyalesokhin-starkware, @piotmag769, and @zmalatrax)

@Arcticae Arcticae requested a review from piotmag769 October 29, 2024 11:09
Copy link
Collaborator

@Arcticae Arcticae left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 9 unresolved discussions (waiting on @ilyalesokhin-starkware, @piotmag769, and @zmalatrax)

a discussion (no related file):
Just to be aligned on this - what's the plan for the stabilization of this, and next steps?
I see benefit of finally converging to a common format that all people can use (which would be in the interest of potential and current USC users for broader adoption), but i'm not sure what the next steps are here.

I am kind of afraid of a scenario where we build on this concept here and it's set in stone, whereas USC provides the same (essentially) functionality with more flexibility (you can change the CASM format in a semver-compatible manner there as well).



crates/cairo-lang-sierra-to-casm/src/compiler.rs line 169 at r14 (raw file):

impl CasmCairoProgram {
    pub fn new(

nitpick: This function is not legible enough (too long) for my taste

@cicr99
Copy link

cicr99 commented Dec 4, 2024

Hi @Arcticae , I work on the Cairo VM in Go being developed by Nethermind. From our side, we were hoping this PR would get merged so that we could use the serialized output for Cairo 1 programs as input for our VM. Currently, we are not using the output provided by USC because it has missing information that is required for the execution, such as the builtins and the entry_points. For us, adding this functionality here makes sense from a usability perspective, specially since it's already generated as part of the existing pipeline and it allows to obtain all necessary components from a single source.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants